Data Type Detection for Choosing an Appropriate Correlation Coefficient in the Bivariate Case

نویسنده

  • Anastasiia Yu. Timofeeva
چکیده

The data scientists usually define a data type based on a nature of variables and select an appropriate correlation measure. However, this is not convenient and very time-consuming in data intensive domains. I propose to detect the types of variables and choose the appropriate correlation coefficient in order to automate the statistical procedure of correlation estimating from mixed data. This should lead to a reduction of time spent on correlation analysis and to increase the accuracy of estimation of correlation coefficients. The continuity index is used to detect whether a variable is continuous or ordered categorical. Based on simulation study I have estimated the cutoff level for the continuity index to choose the Pearson correlation, the polychoric, or the polyserial correlation coefficient.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Test of the Correlation Coefficient in Bivariate Normal Populations Using Ranked Set Sampling

Ranked Set Sampling (RSS) is a statistical method for data collection that leads to more efficient estimators than competitors based on Simple Random Sampling (SRS). We consider testing the correlation coefficient of bivariate normal distribution based on Bivariate RSS (BVRSS). Under one-sided and two-sided alternatives, we show that the new tests based on BVRSS are more powerful than the usua...

متن کامل

Cauchy Regression and Confidence Intervals for the Slope

This paper uses computer simulations to verify several features of the Greatest Deviation (GD) nonparametric correlation coefficient. First, its asymptotic distribution is used in a simple linear regression setting where both variables are bivariate. Second, the distribution free property of GD is demonstrated using both the bivariate normal and bivariate Cauchy distributions. Third, the robust...

متن کامل

Estimation of Count Data using Bivariate Negative Binomial Regression Models

Abstract Negative binomial regression model (NBR) is a popular approach for modeling overdispersed count data with covariates. Several parameterizations have been performed for NBR, and the two well-known models, negative binomial-1 regression model (NBR-1) and negative binomial-2 regression model (NBR-2), have been applied. Another parameterization of NBR is negative binomial-P regression mode...

متن کامل

A blended model for estimating of missing precipitation data (Case study of Tehran - Mehrabad station)

Meteorological stations usually contain some missing data for different reasons.There are several traditional methods for completing data, among them bivariate and multivariate linear and non-linear correlation analysis, double mass curve, ratio and difference methods, moving average and probability density functions are commonly used. In this paper a blended model comprising the bivariate expo...

متن کامل

Stochastic simulation of bivariate gamma distribution: a frequency-factor based approach

A frequency-factor based approach for stochastic simulation of bivariate gamma distribution is proposed. The approach involves generation of bivariate normal samples with a correlation coefficient consistent with the correlation coefficient of the corresponding bivariate gamma samples. Then the bivariate normal samples are transformed to bivariate gamma samples using the well-known general equa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017